M5 Forecast Predictions

Killian Cimino

2023-12-01

Context

  • What is the M5 Accuracy Competition?
    • M competitions have a 40 year history
    • Aim to assess how to improve forecasting
  • Dataset from Walmart’s hierarchical unit sales
  • 3049 unique items, when accounting for combinations with higher levels total of 42,840 time series in data.

General Look at the Data

There are 1941 days worth of data and 35 unique items in the subset of the dataset used for analysis. The data was restricted to a random sample of 10 items per department as this is meant to just be a demonstration of what is possible.

  • There might be some seasonality, and there’s certainly a trend (though it isn’t perfectly linear)

General Look at the Data

  • There is moderately strong weekly seasonality and strong trend for a few of the departments,
  • The strength of the trend and seasonality seem to differ, so the data will likely benefit from using different models based on the time series.

Baseline Model

  • Mean Forecast
    • Predicts the mean for each item for every future point in time.
  • Best Fit Forecast
    • Uses both traditional time series and machine learning methods
    • Creates a forecast for every item and picks the best based on RMSE

Why It Didn’t Work

  • This data is low volume intermittent demand at granular levels
    • Most traditional forecasting methods (and many ML ones) assume continuous data, and this assumption is not met by this data at the item level.
  • Low volume and intermittent demand data is notoriously difficult to forecast

One potential solution

  • Hierarchical Forecasting

    • Creates a forecast for every node 1 for every level in the hierarchy

    • Current iteration uses minimum trace optimization based on in-sample covariance for reconciliation 2.

Initial Results - Region

For the most part, it seems as though all of the forecasts are reasonable, although they seem to sharply drop off at the last date for unclear reasons. It’s possible there’s an error in the code somewhere, this will require further investigation.

Initial Results - State

These forecasts seem to be missing something. The ETS and Croston are too flat, but the best fit, STLs, and stepwise ARIMA don’t quite match up with the most recent weeks. There is room for improvement here, but they aren’t excessively unreasonable.

Initial Results - Models

Overall, the MAE and RMSE tell a similar story. The STLs seem to overall be the best forecast for many of the items, and the mean forecast is also often the best forecast for some of these items.

The overall RMSE for the reconciled forecast is 2.54, which is a 52.29% improvement compared to the baseline forecast.

Next Steps

  • Test several reconciliation methods to find the best
  • Find a better way to find the best STL combination
  • Add a GARCH Model
  • See if treating the data as longitudinal could yield interesting results.

Appendix

Reconciliation

  • Four main options
    1. Top-down
    • loses information
    • reliable at aggregate level
    1. Bottom-up
    • noisy
    • does not lose information
    1. Middle-out
    • happy medium between top-down and bottom-ups
    1. MinT
    • more computationally demanding
    • likely to be more accurate at all levels

Sample Hierarchy

G United_States United_States South_Region South_Region United_States->South_Region Northeast_Region Northeast_Region United_States->Northeast_Region Tennessee Tennessee Knox_County Knox_County Tennessee->Knox_County Hamilton_County Hamilton_County Tennessee->Hamilton_County Florida Florida DeSoto_County DeSoto_County Florida->DeSoto_County Connecticut Connecticut Middlesex_County Middlesex_County Connecticut->Middlesex_County Hartford_County Hartford_County Connecticut->Hartford_County South_Region->Tennessee South_Region->Florida Northeast_Region->Connecticut

Let’s say historically each node is split equally between the lower nodes, e.g. half of the contracts in the United States go to the Southern Region and half go to the Northeastern Region

Reconciliation - Top-down

G 2 South Region 4 Tennessee 2->4 5 Florida 2->5 3 Northeast Region 6 Connecticut 3->6 7 Knox County 4->7 8 Hamilton County 4->8 9 DeSoto County 5->9 10 Middlesex County 6->10 11 Hartford County 6->11 1 United States 1->2 1->3

Forecast at the very top level, then use proportions to divide up the forecasts to lower nodes.

G 2 South Region 4 Tennessee 2->4 5 Florida 2->5 3 Northeast Region 6 Connecticut 3->6 7 Knox County 4->7 8 Hamilton County 4->8 9 DeSoto County 5->9 10 Middlesex County 6->10 11 Hartford County 6->11 1 110 1->2 1->3

Let’s say we forecasted 110 for the top level, and wanted to use historical proportions where each node contributes equally to the node above.

G 1 110 2 55 1->2 3 55 1->3 4 27.5 2->4 5 27.5 2->5 6 55 3->6 7 13.75 4->7 8 13.75 4->8 9 27.5 5->9 10 27.5 6->10 11 27.5 6->11

You can just divvy up the forecast based on the expected proportions at each level, starting at the top and going down.

Reconciliation - Bottom-up

G 1 United States 2 South Region 1->2 3 Northeast Region 1->3 4 Tennessee 2->4 5 Florida 2->5 6 Connecticut 3->6 7 Knox County 4->7 8 Hamilton County 4->8 9 DeSoto County 5->9 10 Middlesex County 6->10 11 Hartford County 6->11

Forecast at the bottom level, then just add them up to get the higher levels of aggregation.

G 1 United States 2 South Region 1->2 3 Northeast Region 1->3 4 Tennessee 2->4 5 Florida 2->5 6 Connecticut 3->6 7 5 4->7 8 5 4->8 9 5 5->9 10 5 6->10 11 5 6->11

Let’s say you forecast 5 for each county

G 1 25 2 15 1->2 3 10 1->3 4 10 2->4 5 5 2->5 6 10 3->6 7 5 4->7 8 5 4->8 9 5 5->9 10 5 6->10 11 5 6->11

You’d then add up each node to get the number for the node above until you get to the top of the hierarchy.

Reconciliation - Middle-out

G 1 United States 2 South Region 1->2 3 Northeast Region 1->3 4 Tennessee 2->4 5 Florida 2->5 6 Connecticut 3->6 7 Knox County 8 Hamilton County 9 DeSoto County 10 Middlesex County 11 Hartford County 4->7 4->8 5->9 6->10 6->11

Forecast at some point in the middle, then add up to get higher levels of aggregation, and use proportions to divide up the forecasts to lower nodes. For this example we’ll use historical proportions.

G 1 United States 2 South Region 1->2 3 Northeast Region 1->3 4 10 2->4 5 15 2->5 6 30 3->6 7 Knox County 8 Hamilton County 9 DeSoto County 10 Middlesex County 11 Hartford County 4->7 4->8 5->9 6->10 6->11

Let’s say you forecasted the states, 10 for Tennessee, 15 for Florida, and 30 for Connecticut.

G 1 United States 2 South Region 1->2 3 Northeast Region 1->3 4 10 2->4 5 15 2->5 6 30 3->6 7 5 4->7 8 5 4->8 9 15 5->9 10 15 6->10 11 15 6->11

You would divide each of those forecasts by the historical proportions to get the values for the nodes below

G 1 55 2 25 1->2 3 30 1->3 4 10 2->4 5 15 2->5 6 30 3->6 7 5 4->7 8 5 4->8 9 15 5->9 10 15 6->10 11 15 6->11

Then you would add to get the values of higher nodes.

Reconciliation - MinT

G Tennessee Tennessee Knox_County Knox_County Tennessee->Knox_County Hamilton_County Hamilton_County Tennessee->Hamilton_County Florida Florida DeSoto_County DeSoto_County Florida->DeSoto_County Connecticut Connecticut Middlesex_County Middlesex_County Connecticut->Middlesex_County Hartford_County Hartford_County Connecticut->Hartford_County United_States United_States South_Region South_Region United_States->South_Region Northeast_Region Northeast_Region United_States->Northeast_Region South_Region->Tennessee South_Region->Florida Northeast_Region->Connecticut

Forecast at every level and then use linear algebra to minimize the errors at every level. It does this by minimizing the trace of a matrix, and there are several matrices you can use for this based on the data you have. If you want to learn more I’d suggest checking out forecasting principles and practices 3. It gives a good overview of the subject, and from there you can choose if you want to read the papers on it (unfortunately I haven’t found any good videos that explain the math behind it to recommend).

Method

-Back tested all plausible models

-Fitted values were rounded

-Negative fitted values were changed to zero

Models tested

  • stepwise ARIMA (using the fable package, fits the best ARIMA based on a chosen metric)
  • ETS
  • RandomWalk
  • SeasonalNaive
  • Croston
  • STLs
    • Season 7, trend 51
    • Season 7, trend 3
    • Season 7, trend 121
    • Season 7, trend 91
    • fable default
  • Poisson regression
  • Zero inflated Poisson regression
  • XGBoost

Sources

Data: - https://github.com/Mcompetitions/M5-methods accessed December 1st

Math and major packages:

Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3. Accessed on October 10, 2023.

Mitchell O’Hara-Wild, Rob Hyndman and Earo Wang (2023). fable: Forecasting Models for Tidy Time Series. R package version 0.3.3. https://CRAN.R-project.org/package=fable

Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526), 804–819. DOI

To see my code for this, check out my notebook here